# @lopezobrador_ Twitter/X Corpus (2009-2026)

## Dataset Overview

This dataset contains tweet metadata published by Andres Manuel Lopez Obrador 
(@lopezobrador_) on the Twitter/X platform, collected for academic research on 
democratic erosion in Latin America.

---

## Files Included

| File | Description |
|------|-------------|
| tweets_amlo_ids_publico.csv | Public dataset without tweet text |
| codebook_amlo.docx | Full variable descriptions and methodology |
| scraper.py | Python module for tweet collection via Playwright |
| SCRAP_X_AMLO_v3.R | R orchestration script |

---

## Dataset Structure (tweets_amlo_ids_publico.csv)

| Variable | Type | Description |
|----------|------|-------------|
| id | character | Unique tweet ID (Snowflake format — treat as text, not number) |
| fecha | character | Publication timestamp (ISO 8601, UTC) |
| likes | integer | Like count at time of collection |
| retweets | integer | Retweet count at time of collection |
| replies | integer | Reply count at time of collection |
| es_respuesta | logical | TRUE if tweet is a reply to another user |
| periodo | character | Political period classification |
| fecha_local | datetime | Timestamp in Mexico City local time (UTC-6) |
| tipo_tweet | character | Tweet type: original or respuesta |

NOTE: Tweet text (texto) and URL (url) columns are excluded from this 
public release in accordance with Twitter/X Developer Policy. 
See "Reconstructing Tweet Text" below.

---

## Political Period Classification

| Code | Label | Date Range | Context |
|------|-------|------------|---------|
| P0_prepresidencia | Pre-presidency | Oct 13, 2009 - Nov 30, 2018 | Opposition leader (PRD, then Morena). Presidential campaigns of 2006 (lost), 2012 (lost), and 2018 (won with 53%). Founding of Morena party (2014). |
| P1_presidencia | Presidency | Dec 1, 2018 - Sep 30, 2024 | Full presidential term. Daily morning press conferences (mananeras), COVID-19 response (2020), midterm elections (2021), recall referendum (2022), transition to Claudia Sheinbaum (2024). |
| P2_postpresidencia | Post-presidency | Oct 1, 2024 - present | Near-total digital withdrawal. Only 3 tweets in 17 months — a methodologically significant finding. |

---

## Coverage Summary

| Period | Tweets | % of Total |
|--------|--------|------------|
| P0 Pre-presidency | 3,455 | 51.0% |
| P1 Presidency | 3,322 | 49.0% |
| P2 Post-presidency | 3 | 0.04% |
| TOTAL | 6,780 | 100% |

Content: Original tweets and replies only. Retweets excluded by design.
Languages: All languages (no filter applied).
Metrics: Captured at collection time (February-March 2026).

---

## Reconstructing Tweet Text (Hydration)

Tweet text can be reconstructed from the provided IDs using the Twitter/X API:

1. Create a Twitter/X Developer account at developer.twitter.com
2. Apply for API access (Basic tier or higher)
3. Use the GET /2/tweets endpoint with the tweet IDs from this dataset
4. Important: Tweets deleted after collection will not resolve (ID will return 404)

Example using Python (tweepy v5+):

```python
import tweepy
import pandas as pd

# Twitter/X API v2
client = tweepy.Client(bearer_token="YOUR_BEARER_TOKEN")

# Read IDs from CSV
df = pd.read_csv("tweets_amlo_ids_publico.csv")
ids = df['id'].tolist()

# Hydrate (max 100 IDs per request)
tweets = client.get_tweets(
    ids=ids[:100],
    tweet_fields=["text", "created_at", "author_id"]
)
```

Note: Twitter/X Developer Policy section 4.A permits academic redistribution 
of tweet IDs for non-commercial research purposes.

---

## Collection Methodology

Data was collected using a custom pipeline combining R (v4.5.2) and Python (v3.14) 
via the reticulate package. Browser automation was performed with Playwright 
(Python v1.58.0, Chromium) using a persistent session architecture.

Collection strategy:
  Phase 1: Daily sweep (one query per day, Oct 2009 to Mar 2026)
  Phase 2: Hourly re-query for days with >= 18 tweets (four 6-hour UTC windows)
  Phase 3: Verification of critical political event periods

Known limitation: Twitter/X free-tier accounts are subject to a ~20 result 
display cap per search query. Days with high tweet volume may be underrepresented 
despite the multi-phase strategy.

For complete methodology, see codebook_amlo.docx.

---

## Ethical and Legal Statement

This dataset was collected for non-commercial academic research purposes only.
All content consists of public tweets voluntarily published by a head of state.
No private accounts or protected content was accessed.

The scripts are published for methodological transparency and reproducibility.
Their use is subject to Twitter/X Terms of Service.

---

## Suggested Citation (APA 7th Edition)

[Author(s)]. (2026). @lopezobrador_ Twitter/X Corpus (2009-2026) [Dataset]. 
Harvard Dataverse. https://doi.org/[REPLACE WITH DOI AFTER PUBLICATION]

---

## License

Creative Commons Attribution 4.0 International (CC BY 4.0)
https://creativecommons.org/licenses/by/4.0/

---

## Related Dataset

@nayibbukele Twitter/X Corpus (2012-2026) — available separately on 
Harvard Dataverse. Both datasets share identical structure and were collected 
using the same methodology, enabling direct comparative analysis of left-wing 
and right-wing populism in Latin America.
DOI: [to be added upon publication]
